AITopics | student solution

Collaborating Authors

student solution

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

BacPrep: An Experimental Platform for Evaluating LLM-Based Bacalaureat Assessment

Marius, Dumitran Adrian, Radu, Dita

arXiv.org Artificial IntelligenceSep-30-2025

Accessing quality preparation and feedback for the Romanian Bacalaureat exam is challenging, particularly for students in remote or underserved areas. This paper introduces BacPrep, an experimental online platform exploring Large Language Model (LLM) potential for automated assessment, aiming to offer a free, accessible resource. Using official exam questions from the last 5 years, BacPrep employs one of Google's newest models, Gemini 2.0 Flash (released Feb 2025), guided by official grading schemes, to provide experimental feedback. Currently operational, its primary research function is collecting student solutions and LLM outputs. This focused dataset is vital for planned expert validation to rigorously evaluate the feasibility and accuracy of this cutting-edge LLM in the specific Bacalaureat context before reliable deployment.

large language model, machine learning, student solution, (18 more...)

arXiv.org Artificial Intelligence

2506.04989

Country:

Europe > Romania > București - Ilfov Development Region > Municipality of Bucharest > Bucharest (0.05)
Asia > China > Hubei Province > Wuhan (0.04)

Genre: Research Report (1.00)

Industry: Education > Educational Technology > Educational Software > Computer Based Training (0.71)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

LLMs cannot spot math errors, even when allowed to peek into the solution

Srivatsa, KV Aditya, Maurya, Kaushal Kumar, Kochmar, Ekaterina

arXiv.org Artificial IntelligenceSep-3-2025

Large language models (LLMs) demonstrate remarkable performance on math word problems, yet they have been shown to struggle with meta-reasoning tasks such as identifying errors in student solutions. In this work, we investigate the challenge of locating the first error step in stepwise solutions using two error reasoning datasets: VtG and PRM800K. Our experiments show that state-of-the-art LLMs struggle to locate the first error step in student solutions even when given access to the reference solution. To that end, we propose an approach that generates an intermediate corrected student solution, aligning more closely with the original student's solution, which helps improve performance.

large language model, machine learning, student solution, (19 more...)

arXiv.org Artificial Intelligence

2509.01395

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
Asia > Singapore (0.04)
(7 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)

Add feedback

Benchmarking Large Language Models on Homework Assessment in Circuit Analysis

Chen, Liangliang, Qin, Zhihao, Guo, Yiming, Rohde, Jacqueline, Zhang, Ying

arXiv.org Artificial IntelligenceJun-10-2025

Large language models (LLMs) have the potential to revolutionize various fields, including code development, robotics, finance, and education, due to their extensive prior knowledge and rapid advancements. This paper investigates how LLMs can be leveraged in engineering education. Specifically, we benchmark the capabilities of different LLMs, including GPT-3.5 Turbo, GPT-4o, and Llama 3 70B, in assessing homework for an undergraduate-level circuit analysis course. We have developed a novel dataset consisting of official reference solutions and real student solutions to problems from various topics in circuit analysis. To overcome the limitations of image recognition in current state-of-the-art LLMs, the solutions in the dataset are converted to LaTeX format. Using this dataset, a prompt template is designed to test five metrics of student solutions: completeness, method, final answer, arithmetic error, and units. The results show that GPT-4o and Llama 3 70B perform significantly better than GPT-3.5 Turbo across all five metrics, with GPT-4o and Llama 3 70B each having distinct advantages in different evaluation aspects. Additionally, we present insights into the limitations of current LLMs in several aspects of circuit analysis. Given the paramount importance of ensuring reliability in LLM-generated homework assessment to avoid misleading students, our results establish benchmarks and offer valuable insights for the development of a reliable, personalized tutor for circuit analysis -- a focus of our future work. Furthermore, the proposed evaluation methods can be generalized to a broader range of courses for engineering education in the future.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2506.0639

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(10 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.86)
Instructional Material > Course Syllabus & Notes (0.67)

Industry:

Education > Educational Setting (1.00)
Education > Curriculum > Subject-Specific Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

PyEvalAI: AI-assisted evaluation of Jupyter Notebooks for immediate personalized feedback

Wandel, Nils, Stotko, David, Schier, Alexander, Klein, Reinhard

arXiv.org Artificial IntelligenceFeb-25-2025

Grading student assignments in STEM courses is a laborious and repetitive task for tutors, often requiring a week to assess an entire class. For students, this delay of feedback prevents iterating on incorrect solutions, hampers learning, and increases stress when exercise scores determine admission to the final exam. Recent advances in AI-assisted education, such as automated grading and tutoring systems, aim to address these challenges by providing immediate feedback and reducing grading workload. However, existing solutions often fall short due to privacy concerns, reliance on proprietary closed-source models, lack of support for combining Markdown, LaTeX and Python code, or excluding course tutors from the grading process. To overcome these limitations, we introduce PyEvalAI, an AI-assisted evaluation system, which automatically scores Jupyter notebooks using a combination of unit tests and a locally hosted language model to preserve privacy. Our approach is free, open-source, and ensures tutors maintain full control over the grading process. A case study demonstrates its effectiveness in improving feedback speed and grading efficiency for exercises in a university-level course on numerics.

pyevalai, student, tutor, (13 more...)

arXiv.org Artificial Intelligence

2502.18425

Country:

Europe > Germany (0.04)
Europe > United Kingdom (0.04)

Genre:

Research Report (0.83)
Instructional Material > Course Syllabus & Notes (0.68)

Industry:

Information Technology > Security & Privacy (1.00)
Education > Educational Setting (0.95)
Education > Curriculum (0.94)
Education > Educational Technology > Educational Software > Computer-Aided Assessment (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Add feedback

Evaluating GPT-4 at Grading Handwritten Solutions in Math Exams

Caraeni, Adriana, Scarlatos, Alexander, Lan, Andrew

arXiv.org Artificial IntelligenceDec-12-2024

Recent advances in generative artificial intelligence (AI) have shown promise in accurately grading open-ended student responses. However, few prior works have explored grading handwritten responses due to a lack of data and the challenge of combining visual and textual information. In this work, we leverage state-of-the-art multi-modal AI models, in particular GPT-4o, to automatically grade handwritten responses to college-level math exams. Using real student responses to questions in a probability theory exam, we evaluate GPT-4o's alignment with ground-truth scores from human graders using various prompting techniques. We find that while providing rubrics improves alignment, the model's overall accuracy is still too low for real-world settings, showing there is significant room for growth in this task.

gpt-4o, grading handwritten solution, student response, (11 more...)

arXiv.org Artificial Intelligence

2411.05231

Country: North America > United States > Massachusetts > Hampshire County > Amherst (0.06)

Genre: Research Report (0.50)

Industry: Education > Educational Technology > Educational Software > Computer-Aided Assessment (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.35)

Add feedback

Stepwise Verification and Remediation of Student Reasoning Errors with Large Language Model Tutors

Daheim, Nico, Macina, Jakub, Kapur, Manu, Gurevych, Iryna, Sachan, Mrinmaya

arXiv.org Artificial IntelligenceJul-12-2024

Large language models (LLMs) present an opportunity to scale high-quality personalized education to all. A promising approach towards this means is to build dialog tutoring models that scaffold students' problem-solving. However, even though existing LLMs perform well in solving reasoning questions, they struggle to precisely detect student's errors and tailor their feedback to these errors. Inspired by real-world teaching practice where teachers identify student errors and customize their response based on them, we focus on verifying student solutions and show how grounding to such verification improves the overall quality of tutor response generation. We collect a dataset of 1K stepwise math reasoning chains with the first error step annotated by teachers. We show empirically that finding the mistake in a student solution is challenging for current models. We propose and evaluate several verifiers for detecting these errors. Using both automatic and human evaluation we show that the student solution verifiers steer the generation model towards highly targeted responses to student errors which are more often correct with less hallucinations compared to existing baselines.

reference solution, student, student solution, (14 more...)

arXiv.org Artificial Intelligence

2407.09136

Country:

Europe > Switzerland > Zürich > Zürich (0.04)
Asia > Singapore (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(8 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.68)
Education > Curriculum > Subject-Specific Education (0.67)
Education > Educational Technology > Educational Software (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Estimating Difficulty Levels of Programming Problems with Pre-trained Model

Wang, Zhiyuan, Zhang, Wei, Wang, Jun

arXiv.org Artificial IntelligenceJun-13-2024

As the demand for programming skills grows across industries and academia, students often turn to Programming Online Judge (POJ) platforms for coding practice and competition. The difficulty level of each programming problem serves as an essential reference for guiding students' adaptive learning. However, current methods of determining difficulty levels either require extensive expert annotations or take a long time to accumulate enough student solutions for each problem. To address this issue, we formulate the problem of automatic difficulty level estimation of each programming problem, given its textual description and a solution example of code. For tackling this problem, we propose to couple two pre-trained models, one for text modality and the other for code modality, into a unified model. We built two POJ datasets for the task and the results demonstrate the effectiveness of the proposed approach and the contributions of both modalities.

codebert, programming problem, representation, (11 more...)

arXiv.org Artificial Intelligence

2406.08828

Country:

Asia > China > Shanghai > Shanghai (0.05)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

MathDial: A Dialogue Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems

Macina, Jakub, Daheim, Nico, Chowdhury, Sankalan Pal, Sinha, Tanmay, Kapur, Manu, Gurevych, Iryna, Sachan, Mrinmaya

arXiv.org Artificial IntelligenceOct-23-2023

While automatic dialogue tutors hold great potential in making education personalized and more accessible, research on such systems has been hampered by a lack of sufficiently large and high-quality datasets. Collecting such datasets remains challenging, as recording tutoring sessions raises privacy concerns and crowdsourcing leads to insufficient data quality. To address this, we propose a framework to generate such dialogues by pairing human teachers with a Large Language Model (LLM) prompted to represent common student errors. We describe how we use this framework to collect MathDial, a dataset of 3k one-to-one teacher-student tutoring dialogues grounded in multi-step math reasoning problems. While models like GPT-3 are good problem solvers, they fail at tutoring because they generate factually incorrect feedback or are prone to revealing solutions to students too early. To overcome this, we let teachers provide learning opportunities to students by guiding them using various scaffolding questions according to a taxonomy of teacher moves. We demonstrate MathDial and its extensive annotations can be used to finetune models to be more effective tutors (and not just solvers). We confirm this by automatic and human evaluation, notably in an interactive setting that measures the trade-off between student solving success and telling solutions. The dataset is released publicly.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2305.14536

Country:

North America > United States > Washington > King County > Seattle (0.28)
North America > United States > New York > New York County > New York City (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
(14 more...)

Genre: Research Report (1.00)

Industry:

Education > Educational Setting (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Teaching UML Skills to Novice Programmers Using a Sample Solution Based Intelligent Tutoring System

Schramm, Joachim (Clausthal University of Technology) | Strickroth, Sven (Clausthal University of Technology) | Le, Nguyen-Thinh (Clausthal University of Technology) | Pinkwart, Niels (Clausthal University of Technology)

AAAI ConferencesMay-20-2012

Modeling skills are essential during the process of learning programming. ITS systems for modeling are typically hard to build due to the ill-definedness of most modeling tasks. This paper presents a system that can teach UML skills to novice programmers. The system is “simple and cheap” in the sense that it only requires an expert solution against which the student solutions are compared, but still flexible enough to accommodate certain degrees of solution flexibility and variability that are characteristic of modeling tasks. An empirical evaluation via a controlled lab study showed that the system worked fine and, while not leading to significant learning gains as compared to a control condition, still revealed some promising results.

class diagram, diagram, student, (13 more...)

AAAI Conferences

Twenty-Fifth International FLAIRS Conference

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > West Yorkshire > Leeds (0.04)
Europe > Germany (0.04)

Genre: Research Report > Experimental Study (0.88)

Industry: Education > Educational Technology > Educational Software > Computer Based Training (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.68)

Add feedback